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Abstract 

To estimate geometrically regular images in the white noise model and obtain an adap- 
tive near asymptotic minimaxity result, we consider a model selection based bandlet 
estimator. This bandlet estimator combines the best basis selection behaviour of the 
model selection and the approximation properties of the bandlet dictionary. We de- 
rive its near asymptotic minimaxity for geometrically regular images as an example of 
model selection with general dictionary of orthogonal bases. This paper is thus also a 
self contained tutorial on model selection with orthogonal bases dictionary. 
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1. Introduction 

A model selection based bandlet estimator has been introduced by Peyre et al. lEoll 
to reduce white noise added to images having a geometrical regularity. This estima- 
tor projects the observations on orthogonal bandlet vectors selected in a dictionary of 
orthonormal bases. This paper shows that the risk of this estimator is nearly asymp- 
totically minimax for geometrically regular images. It is also a tutorial on estimation 
with general dictionary of orthogonal bases, through model selection. It explains with 
details how to build a thresholding estimator in an adaptively chosen "best" basis and 
analyzes its performance with the model selection approach of Barron et al. [2]. 

Section [2] describes the statistical setting of the white noise model, and introduces 
the model of geometrically regular images. Images in this class, originally pro- 
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posed by Korostelev and Tsybakov 13], are, roughly, (Holder regularity a) out- 
side a set of curves in [0, 1]^. Korostelev and Tsybakov 1 14] prove that the minimax 
quadratic risk over this class, for a Gaussian white noise of variance cr^, has an asymp- 
totic decay of the order of cr^a/la+i) xhey show that the risk of any possible estimator 
cannot decay faster than this rate uniformly for all functions of this class and exhibit 
an estimator that achieves this rate. Their estimator relies on the knowledge of the 
regularity exponent a and on an explicit detection of the contours, and is not stable 
relatively to any image blurring. Later, Donoho [10] overcomes the detection issue by 
replacing it with an well-posed optimization problem. Nevertheless, both use a model 
of images with sharp edges which limits their applications since most image edges are 
not strict discontinuities. They are blurred because of various diffraction effects which 
regularize discontinuities by unknown factors. 

The model selection based bandlet estimator, which can also be described as a 
thresholding estimator in a best bandlet basis, does not have this restriction. It does not 
rely on the detection of the precise localization of an edge but only of a looser local 
direction of regularity. Furthermore, these directions of regularity are not estimated 
directly but indirectly through a best orthogonal basis search algorithm which does 
not require to know the regularity parameter a. Section [3] gives a tutorial introduction 
of this type of estimators for arbitrary dictionary. This generic class of thresholding 
estimators in a best basis selected in a dictionary of orthonormal bases has been already 
studied by Donoho and Johnstone [11] and fit into the framework of Barron et al. [2], 
ll] and lll 8l] This (self contained) section recalls the framework of these estimators and 
their theoretical performance. For the sake of completeness, a simplified proof of the 
main model selection result is given in Appendix. 

Section IHreturns to the specific setting of image processing and applies the results 
of the previous section to geometric image estimation. The choice of the representation 
(the choice of the dictionary of orthogonal bases) becomes crucial and, after a short 
description of the bandlet bases, their use is justified. The paper is concluded with 
Theorem[3]which states the adaptive near asymptotic minimaxity of the selection model 
based bandlet estimator for geometrically regular images. 

2. Image estimation 

2.1. White noise model and acquisition 

During the digital acquisition process, a camera measures an analog image / with a 
filtering and sampling process, which introduces an additive white noise. In this white 
noise model, the process that is observed can be written 

dX, = /(x)dx+(TdWx, 

where Wx is the Wiener process and cr is a known noise level parameter. This equation 
means that one is able to observe a Gaussian field Xg indexed by functions g G of 
mQdin E{Xg) = (f^g) and covariance £^ [^.g^g'] — {§^§^)- 

This model allows to consider asymptotics over cr of a discrete camera measure- 
ments process. Indeed, the measurement of a camera with pixels can be modelled 
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as the measure of X^^ over a family 0„ of N impulse responses of the photo- sensors. 
Those measurements, 

X^^ = {f,(^n)^CJWc^JorO<n<N 

where Wg is a Gaussian field of zero mean and covariance E [WgWg'] = define 
a "projection" of our observation dX on the space V/v, spanned by the 0^, that we 
denote /V^X. The white noise model allows to modify the resolution of the camera 
depending on the noise level. To simplify explanations, in the following we suppose 
that {(l)n}o<n<N is an orthogonal basis, with no loss of generality, and thus that 

N 

2.2. Minimax risk and geometrically regular images 

We study the maximum risk of estimators for images / in a given class with re- 
spect to G. Model classes are often derived from classical regularity spaces (C^ spaces, 
Besov spaces,...). This does not take into account the existence of geometrically regu- 
lar structures such as edges. This paper uses a geometric image model appropriate for 
edges, but not for textures, where images are considered as piecewise regular functions 
with discontinuities along regular curves in [0, 1]^. This geometrical image model has 
been proposed by Korostelev and Tsybakov |[l4ll in their seminal work on image es- 
timation. It is used as a benchmark to estimate or approximate images having some 
kind of geometric regularity (Donoho Shukla et al. |21],...). An extension of this 
model that incorporates a blurring kernel h has been proposed by Le Pennec and Mal- 
lat |[l6l] to model the various diffraction effects. The resulting class of images studied 
in this paper is the set of geometrically regular images specified by the following 
definition. 

Definition 1. A function f G i-^([0, 1]^) is geometrically regular over [0, 1]^ if 

• f = forf = f^h with f e C«(A) for A = [0, 1]^ - {^^}i<^<g, 

• the blurring kernel h is C^, compactly supported in [—s^s]^ and ||/i||c« ^ 

• the edge curves are and do not intersect tangentially ifa> 1. 

2.3. Edge based estimation 

Korostelev and Tsybakov [l3] have built an estimator that is asymptotically mini- 
max for geometrically regular functions /, as long as there is no blurring and hence that 
h = 5. With a detection procedure, they partition the image in regions where the image 
is either regular or which include a "boundary fragment" corresponding to the subpart 
of a single discontinuity curve. In each region, they use either an estimator tailored to 
this "boundary fragments" or a classical kernel estimator for the regular regions. This 
yields a global estimate F of the image /. If the / is outside the boundaries and if 
the parametrization of the curve is also then there exists a constant C such that 
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This rate of convergence achieves the asymptotic minimax rate for uniformly func- 
tions and thus the one for geometrically regular functions that includes this class. 
This means that sharp edges do not alter the rate of asymptotic minimax risk. However, 
this estimator is not adaptive relatively to the Holder exponent a that must be known 
in advance. Furthermore, it uses an edge detection procedure that fails when the image 
is blurred or when the discontinuity jumps are not sufficiently large. 

Donoho [10] and Shukla et al. [21] reuse the ideas of "boundary fragment" under 
the name "horizon model" to construct a piecewise polynomial approximation of im- 
ages. They derive efficient estimators optimized for a G [1,2]. These estimators use 
a recursive partition of the image domain in dyadic squares, each square being split 
in two parts by an edge curve that is a straight segment. Both optimize the recursive 
partition and the choice of the straight edge segment in each dyadic square by mini- 
mizing a global function. This process leads to an asymptotically minimax estimator 
up to a logarithmic factor which is adaptive relatively to the Holder exponent as long 
as a G [1,2]. 

Korostelev and Tsybakov 1 14] as well as Donoho 1 10] and |21] rely on the sharp- 
ness of image edges in their estimators. In both cases, the estimator is chosen amongst 
a family of images that are discontinuous across parametrized edges, and these estima- 
tors are therefore not appropriate when the image edges are blurred. We now consider 
estimators that do not have this restriction: they project the observation on adaptive 
subspaces in which blurred as well as sharp edges are well represented. They rely on 
two ingredients: the existence of bases in which geometrical images can be efficiently 
approximated and the existence of a mechanism to select, from the observation, a good 
basis and a good subset of coefficients onto which it suffices to project the observation 
to obtain a good estimator. We focus first on the second issue. 

3. Projection Estimator and Model Selection 

The projection estimators we study are decomposed in two steps. First a linear 
projection reduces the dimensionality of the problem by projecting the signal in a finite 
dimensional space. This first projection is typically performed by the digital acquisition 
device. Then a non-linear projection estimator refines this projector by reprojecting the 
resulting finite dimensional observation in a space that is chosen depending upon this 
observation. This non-linear projection is obtained with a thresholding in a best basis 
selected from a dictionary of orthonormal bases. Best basis algorithms for noise re- 
moval have been introduced by Coifman and Wickerhauser [8]. As recalled by Candes 
10], their risks have already been studied by Donoho and Johnstone [11] and are a spe- 
cial case of the general framework of model selection proposed by Birge and Massart 
0]. Note that Kolaczyk and Nowak [13] have studied a similar problem in a slightly 
different setting. We recall in this section the framework of model selection and state a 
selection model theorem (Theorem [T]) that is the main statistical tool to prove the per- 
formance on the model selection based bandlet estimator. This section is intended as 
a self contained tutorial presentation of these best basis estimators and their resulting 
risk upper bounds and contains no new results. Nevertheless, a simple (novel) proof of 
the (simplified) main result is given in Appendix. 
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3.1. Approximation space and further projection 

The first step of our estimators is a projection in a finite dimension space Viv 
spanned by an orthonormal family {(^^}o<w<iv. The choice of the dimension N and 
of the space Va^ depends on the noise level a but should not depend on the function / 
to be estimated. Assume for now that Vj^ is fixed and thus that we observe /V^v^- This 
observation can be decomposed into Py^^f + oWvj^ where Wy^^ is a finite dimensional 
white noise on Vj^. 

Our final estimator is a reprojection of this observation Py^^X onto a subspace ^ C 
Viv which may (and will) depend on the observation: the projection based estimator 
P^Py^X = P^X. The overall quadratic error can be decomposed in three terms: 

\\f-P^xf = \\f-PyM^^\\Pv,f-P^ff^o^P^W\\\ 

The first term is a bias term corresponding to the first linear approximation error due 
to the projection on Viv, the second term is also a bias term which corresponds to the 
non linear approximation of Py^f on ^ while the third term is a "variance" term 
corresponding to the contribution of the noise on 

The dimension N of Va^ has to be chosen large enough so that with high probability, 
for reasonable Wf-Py^fW^ < \\Py^f-P^f\\^ + \\P^W\\^. From the practical 
point of view, this means that the acquisition device resolution is set so that the first 
linear approximation error due to discretization is smaller than the second non linear 
noise related error. Engineers often set so that both terms are of the same order of 
magnitude, to limit the cost in terms of storage and computations. In our white noise 
setting, we will explain how to chose N depending on a. 

For a fixed V^v, in order to obtain a small error, we need to balance between the two 
remaining terms. A space ^ of large dimension may reduce the second bias term but 
will increase the variance term, a space ^ of small dimension does the opposite. It is 
thus necessary to find a trade-off between these two trends, and select a space ^ to 
minimize the sum of those two terms. 

3.2. Model Selection in a Dictionary of orthonormal bases 

We consider a (not that) specific situation in which the space ^ is spanned by 
some vectors from some orthonormal bases of Vn. More precisely, let ^ = {gn}o<n<N 
be an orthonormal basis of V^v, that may be different from {0„}, we consider space ^ 
spanned by a sub-family {gnk}i<k<M of ^ vectors and the projections of our observa- 
tion on those spaces 

M 
k=l 

Note that this projection, or more precisely its decomposition in the basis {0^}? can be 
computed easily from the decomposition of P^X in the same basis. 

Instead of choosing a specific single orthonormal basis we define a dictionary 
which is a collection of orthonormal bases in which we choose adaptively the basis 
used. Note that some bases of n^ay have vectors in common. This dictionary can 
thus also be viewed as set {g^} of > N different vectors, that are regrouped to 
form many different orthonormal bases. Any collection of M vectors from the same 
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orthogonal basis ^ G generates a space ^ that defines a possible estimator Pj^X 
of /. Let = {^y}rN be the family of all such projection spaces. Ideally we would 
like to find the space ^ which minimizes || / — P^X\\. We want thus to choose 
a "best" model ^ amongst a collection that is we want to perform a model selection 
task. 

3.3. Oracle Model 

As a projection estimator yields an estimation error 

\\f-P^xf = \\f-Pv,f + \\Pv, - P.^ff + WP^Wf = \\f-P.^ff + WP^wf, 
the expected error of such an estimator is given by 

E[\\f-P^xf]=\\f-P^ff^a^dim{^). 

The best subspace for this criterion is the one that realizes the best trade-off between 
the approximation error || / — P^f\\'^ and the complexity of the models measured by 

This expected error cannot be computed in practise since we have a single realiza- 
tion of dX (or of Pvj^X) . To (re)derive the classical model selection procedure of Birge 
and Massart [3], we first slightly modify our problem by searching for a subspace ^ 
such that the estimation error obtained by projecting Pvj^X on this subspace is small 
with only an overwhelming probability. As in all model selection papers, we use an 
upper bound of the estimation error obtained from an upper bound of the energy of the 
noise projected on Each of the projections of the noise on the different 
vectors in the bases of the dictionary is thus Wgj^gk- Its law is a Gaussian random 
variable of variance cr^ along the vector g^. A standard large deviation result proves 
that the norms of such Gaussian random variables are bounded simultaneously by 
T = a^llogKN with a probability that tends to 1 when N increases. Since the noise 
energy projected in ^ is the sum of dim (^) squared dictionary noise coefficients, 
we get \\P^W\\^ < dim(^) T^. It results that 

\\f-P^Xf<\\f-P^ff^dim{^)T\ (1) 

over all subspaces ^ with a probability that tends to 1 as increases. The estimation 
error is small if is a space of small dimension dim {^) which yields a small ap- 
proximation error || / — P^f\\. We denote by ^ the space that minimizes the 
estimation error upper bound ([B 

^o = arg min {\\f - P^ff ^ dim {^) T^). 

Note that this optimal space cannot be determined from the observation X since / is 
unknown. It is called the oracle space , hence the O in the notation, to remind this fact. 
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3.4. Penalized empirical error 

To obtain an estimator, it is thus necessary to replace this oracle space by a "best" 
space obtained only from the observation Py^X that yields (hopefully) a small estima- 
tion error. A first step toward this goal is to notice that since all the spaces ^ are 
included into Va^, minimizing 

\\f-Pj^ff^dim{J^) 

is equivalent to minimizing 

WPyJ-Pj^fW^^&MJ^) 

. A second step is to consider the crude estimation of ||/V^/ — given by the 
empirical norm 

\\Pv,X-P,^Xf=\\Pv,Xf-\\P^X\\\ 

This may seem naive because estimating \\Pvj^f — Pj^fW^ with ||/Viv^ — Pj^X\\^ yields 
a large error 

\\Pv,X-P,^X\\^-\\PvJ-P,^f\\^ = (||/V„xf - ||/V^/f) + (||P^/||2_ llp^xf ), 

whose expected value is {N — dim{^))a^, with typically dim{^) <C N. However, 
most of this error is in the first term on the right hand-side, which has no effect on 
the choice of space This choice depends only upon the second term and is thus 
only influenced by noise projected in the space ^ of lower dimension dim {^). The 
bias and the fluctuation of this term, and thus the choice of the basis, are controlled by 
increasing the parameter T. 

We define the best empirical projection estimator P^sls the estimator that mini- 
mizes the resulting empirical penalized risk: 

^=arg min \\Pv^X - P^xf ^dim{^) (2) 

3.5. Thresholding in a best basis 

Finding the best estimator which minimizes ^ may seem computationally un- 
tractable because the number of possible spaces ^ is typically an exponential 
function of the number of vectors in We show that this best estimator may 
however be found with a thresholding in a best basis. 

Suppose that we impose that ^ are generated by a subset of vectors from a basis 
^ G The following (classical) lemma proves that among all such spaces, the best 
projection estimator is obtained with a thresholding at T. 

Lemma 1. Among all spaces ^ that are generated by a subset of vectors of an 
orthonormal basis ^ = {gn}o<n<N ofV^, the estimator which minimizes \\Pvn^ — 
+ dim(.y#) is the thresholding estimator 

P^^XT^= L {^^Sn)gn- (3) 

n,\{X,gn)\>T 
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Proof. Let ^ = Span{g„}„e/ with / C [0, A/^), as ^ is an orthonormal basis, 

||x - p^x f + dim (^) 72 = £ I {x,^„) |2 + ^ r2 

which is minimal if / = | p > T^}. □ 

The thresholding estimator ^ projects X in the space ^^^x,t generated by the 
vectors {gm}\{x,g,n)\>T^ the vectors of ^ which produce coefficients above threshold. 
This lemma implies that best projection estimators are necessarily thresholding esti- 
mators in some basis. Minimizing ||/Va^X — + dim{^) over ^ G is 
thus equivalent to find the basis ^ of Vn which minimizes the thresholding penalized 
empirical risk: 

J = arg min \\Pv^X - P^^^^xf ^dim{^) t\ 

The best space which minimizes the empirical penalized risk in ([2]) is derived from a 
thresholding in the best basis ^ = y.. 

The following theorem, similar to the one obtained first by Barron et al. proves 
that the thresholding estimation error in the best basis is bounded by the estimation 
error by projecting in the oracle space up to a multiplicative factor. 

Theorem 1. There exists an absolute function Xo{K) > and some absolute con- 
stants £ > and K > such that if we denote = {^yjr the family of projec- 
tion spaces generated by some vectors in an orthogonal basis of a dictionary and 
denote be the number of different vectors in Then for any cr > 0, if we let 
T = X yJ\og{Kj^) a with A > Ao(^a^), then for any f G L^, the thresholding estimator 
F = P X in the best basis 

J = arg min \\Pv^X - P^^^^^^Xf + dim (^^,x,r) 

satisfies 

<(! + £) f min ||/-P^/||2 + dim(^) r 

For the sake of completion, we propose in Appendix a simple proof of Theorem[TJ 
inspired by Birge and Massart [3], which requires only a concentration lemma for 
the norm of the noise in all the subspaces spanned by the generators of Q)^ but 
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with worse constants: Ao(^) = 32 + \og(K) ' ^ — ^ ^ — ^4. Note this Theorem 
can be deduced from Massart |18] with different (better) constant (and for roughly 
using a more complex proof based on subtle Talagrand's inequalities. It 
results that any bound on min^^^^ \\f — Pj^f\\^ + dim T^, gives a bound on the 
risk of the best basis estimator F. 

To obtain a computational estimator, the minimization 

J = argmin \\Pv^X - Pj^^^^X\\^ ^&im{j^^^xj) , 
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should be performed with a number of operations typically proportional to the number 
Kf^ of vectors in the dictionary. This requires to construct appropriate dictionaries of 
orthogonal bases. Examples of such dictionaries have been proposed by Coifman and 
Wickerhauser |8] with wavelet packets or by Coifman and Meyer |7] with local co- 
sine bases for signals having localized time-frequency structures. Next section reviews 
some possible dictionaries for images and recall the construction of the dictionary of 
bandlet orthogonal bases that is adapted to the estimation of geometrically regular im- 
ages. 



4. Best basis image estimation and bandlets 

4.1. Thresholding in a single basis 

When the dictionary S^f^ is reduced to a single basis and there is thus no basis 
choice, Theorem[T] clearly applies and reduces to the classical thresholding Theorem of 
Donoho and Johnstone 1 12]. The corresponding estimator is thus the classical thresh- 
olding estimator which quadratic risk satisfies 




It remains "only" to choose which basis to use and how to define the space V/v with 
respect to a. 

Wavelet bases provide a first family of estimators used commonly in image pro- 
cessing. Such a two dimensional wavelet basis is constructed from two real functions, 
a one dimensional wavelet i/A and a corresponding one dimensional scaling function 0, 
which are both dilated and translated: 

/ A 1 [x-2jk\ . , 1 ^fx-2jk\ 

v^i^w = w'^y-^ ) ^^'^^"^^ = [-^ ) • 

Note that the index j goes to — oo when the wavelet scale 2^ decreases. For a suitable 
choice of Xf/ and (j), the family {y/j^k{x)} j^k is an orthogonal basis of L^([0, 1]) and the 
following family constructed by tensorization 

Vfki^) = ¥iki^u^2) = WjM (^i) 0i,^2fe), > 

is an orthonormal basis of the square [0, 1]^. Furthermore, each space 
Vj = Span{0^-;^^ (•^i)0j,^2fe)}^i,^2^ 

called approximation space of scale 2^, admits {Vn}o,i>jMM orthogonal ba- 

sis. The approximation space of the previous section coincides with the classical 
wavelet approximation space Vj when N = 2~^/^. 

A classical approximation result ensures that for any function / C^, as soon as 
the wavelet has more than [aj + 1 vanishing moments, there is a constant C such 
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that, for any T, min^e^W \\Pv J - P.£f\? + Aim{^) < C{T^)^, and, for any ^, 
WPv^f-ff < CN-". ForN = 2-^/2 with = [2j,V+'^], Theorem[I]thus impUes 

E[\\f-Ff]<C{\log{o)\o')^ . 

This is up to the logarithmic term the best possible rate for functions. Unfortunately, 
wavelets bases do not provides such an optimal representation for the geometrically 
regular functions specified by Definition [T] Wavelets fail to capture the geometrical 
regularity of edges: near them, the wavelets coefficients remain large. As explained in 
Mallat [17], by noticing that those edges contribute at scale 2^ to 0{2~j) coefficients 
of order 0(2^/2), one verifies that the rate of convergence in a wavelet basis decays like 
(I log(cr) I cr^) which is far from the asymptotical minimax rate. 

A remarkably efficient representation was introduced by Candes and Donoho [5]. 
Their curvelets are not isotropic like wavelets but are more elongated along a prefer- 
ential direction and have two vanishing moments along this direction. They are dilated 
and translated like wavelets but they are also rotated. The resulting family of curvelets 
= is not a basis of L^([0, 1]^) but a tight frame of L^(R^). This implies, nev- 
ertheless, that for any / G L^([0, 1]^) 

X \{f,Cn)\'=A\\ff withA>l. 

Although this is not an orthonormal basis, the results of Section [3] can be extended 
to this setting. Projecting the data on the first N = G~^^^ curvelets with significant 
intersection with the unit square and thresholding the remaining coefficients with a 
threshold A V^ogTV a yields an estimator F that satisfies 

E[\\f-Ff]<C{\logo\o')^ 



with a constant C that depends only on /. 
up to the logarithmic factor for a G [1,2]. 
achieve a similar result for a larger than 2. 



This is the optimal decay rate for the risk 
No such fixed representation is known to 



4.2. Dictionary of orthogonal bandlet bases 

To cope with a geometric regularity of order a > 2, one needs basis elements which 
are more anisotropic than the curvelets, are more adapted to the geometry of edges and 
have more vanishing moments in the direction of regularity. Bandlet bases[15, 16, 19] 
are orthogonal bases whose elements have such properties. Their construction is based 
on the observation that even if the wavelet coefficients are large in the neighbourhood 
of an edge, these wavelets coefficients are regular along the direction of the edge as 
illustrated by Fig[T] 

To capture this geometric regularity, the key tool is a local orthogonal transform, in- 
spired by the work of Alpert 1 1], that combines locally the wavelets along the direction 
of regularity, represented by arrows in the rightmost image of Fig[T]), to produce a new 
orthogonal basis, a bandlet basis. By construction, the bandlets are elongated along 
the direction of regularity and have the vanishing moments along this direction. The 
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Figure 1: a) a geometrically regular image, b) the associated wavelet coefficients, c) a close-up of wavelet 
coefficients in a detail space Wj' that shows their remaining regularity, d) the geometrical flow adapted to this 
square of coefficients, here it is vertically constant and parametrized by a polynomial curve y 
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(possibly large) wavelets coefficients are thus locally recombined along this direction, 
yielding more coefficients of small amplitudes than before. 

More precisely, the construction of a bandlet basis of a wavelet multiresolution 
space Vj = Span{0j y^^ y^2}^i,^2 starts by decomposing this space into detail wavelet 
spaces 

y. = with Wf = Span{i/A,^,^,,J,^,,, . 

o,l>j 

For any level / and orientation o, the detail space Wf is a space of dimension (2~^)^. Its 
coefficients are recombined using the Alpert transform induced by some directions of 
regularity. This geometry is specified by a local geometric flow, a vector field meant to 
follow the geometric direction of regularity. This geometric flow is further constraint 
to have a specific structure as illustrated in Fig. |2l It is structured by a partition into 
dyadic squares in which the flow, if there exists, is vertically or horizontally constant. 
In each square of the partition, the flow being thus easily parametrized by its tangent. 

For each choice of geometric flow, a specific orthogonalization process [19] yields 
an orthogonal basis of bandlets that have vanishing moments along the direction of 
the geometric flow. This geometry should obviously be adapted to each image: the 
partition and the flow direction should match the image structures. This choice of 
geometry can be seen as an ill posed problem of estimation of the edges or of the 
direction of regularity. To avoid this issue, the problem is recasted as a best basis 
search in a dictionary. The geometry chosen is the one of the best basis. 

The first step is to define a dictionary ^(2-j)2 of orthogonal bandlet bases of Vj or 
equivalently a dictionary of possible geometric flows. Obviously this dictionary should 
be finite and this require a discretization of the geometry. As proved by Peyre and 
Mallat [l l9l] . this is not an issue: the flow does not have to follow exactly the direction 
of regularity but only up to a sufficient known precision. It is indeed sufficient to 
parametrize the flow in any dyadic square by the tangent of a polynomial of degree p 
(the number of vanishing moments of the wavelets). The coefficients of this polynomial 
can be further quantized. The resulting family of geometric flow in a square is of size 
0{2-jP). 

A basis of the dictionary ^(2-j)2 is thus specified by a set of dyadic squares par- 
titions for each details spaces Wf, I > j, and, for each square of the partition, a 
flow parametrized by a direction and one of these 0{2~^p) polynomials. The number 
of bases in the dictionary ^(2-j)2 grows exponentially with 2~^, but the total num- 
ber of different bandlets ^(2-7)2 grows only polynomially like 0(2"^^^+^)). Indeed 
the bandlets in a given dyadic square with a given geometry are reused in numerous 
bases. The total number of bandlets in the dictionary is thus bounded by the sum 
over aU 0{2-^j) dyadic squares and all (9(2 ^p^) choices for the flow of the number 
of bandlets in the square. Noticing that (2~^)^ is a rough bound of the number of 
bandlets in any subspaces of Vj, we obtain the existence of a constant Ck such that 

4.3. Approximation in bandlet dictionaries 

The key property of the bandlet basis dictionary is that it provides an asymptotically 
optimal representation of geometrically regular functions. Indeed Peyre and Mallat 
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(l^ prove 



Theorem 2. Let a < p where p in the number of wavelet vanishing moments, for any 
f geometrically regular function, there exists a real number C such that for any 
T>Oand V < T 

min ||/-P^^^^^^/|p + dim(^^,^,r)7^' ^Cr^^Z^^^'^ (4) 

where the sub space ^^jj is the space spanned by the vectors of ^ whose inner 
product with f is larger than T. 

This Theorem gives the kind of control we require in Theorem[T] 
Being able to perform efficiently the minimization of the previous Theorem is very 
important to exploit numerically this property. It turns out that a fast algorithm can 
be used to find the best basis that minimizes ||/ — Pj^^^^f\\^ + dim {.M^sj^t) or 
equivalently \\Pvjf — P ^ 7^ / 1 P + dim (^^jj) . We use first the additive structure 
with respect to the subband Wf of this "cost" ||/V,/ -/V^^y^/lP +dim {^^j^t) 
to split the minimization into several independent minimizations on each subbands. A 
bottom-top fast optimization of the geometry (partition and flow) similar to the one 
proposed by Coifman and Wickerhauser |8], and Donoho |9] can be performed on 
each subband thanks to two observations. Firstly, for a given dyadic square, the limited 
number of possible flows is such that the best flow can be obtained with a simple 
brute force exploration. Secondly, the hierarchical tree structure of the partition and 
the additivity of the cost function with respect to the partition implies that the best 
partition of a given dyadic square is either itself or the union of the best partitions of 
its four dyadic subsquares. This leads to a bottom up optimization algorithm once the 
best flow has been found for every dyadic squares. Note that this algorithm is adaptive 
with respect to a: it does not require the knowledge of the regularity parameter to be 
performed. 

More precisely, the optimization algorithm goes as follows. The brute force search 
of the best flow is conducted independently over all dyadic squares and all detail spaces 
with a total complexity of order 0(2"^^^+^)). This yields a value of the penalized 
criterion for each dyadic squares. It remains now to find the best partition. We proceed 
in a bottom up fashion. The best partition with squares of width smaller than 2^^^ 
is obtained from the best partition with squares of width smaller than 2^ : inside each 
dyadic square of width 2^+^ the best partition is either the partition obtained so far 
or the considered square. This choice is made according to the cost computed so far. 
Remark that the initialization is straightforward as the best partition with square of size 
1 is obviously the full partition. The complexity of this best partition search is of order 
0{2~^j) and thus the complexity of the best basis is driven by the best flow search 
whose complexity is of order 0(2"^^^+^^), which nevertheless remains polynomial in 
2-j. 

4.4. Bandlet estimators 

Estimating the edges is a complex task on blurred function and becomes even much 
harder in presence of noise. Fortunately, the bandlet estimator proposed by Peyre et al. 
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do not rely on such a detection process. The chosen geometry is obtained with the 
best basis selection of the previous section. This allows one to select an efficient basis 
even in the noisy setting. 

Indeed, combining the bandlet approximation result of Theorem [2] with the model 
selection results of Theorem [T] proves that the selection model based bandlet estimator 
is near asymptotically minimax for geometrically regular images. 

For a given noise level O", one has to select a dimension N = (2~^ )^ and a threshold 
T. The best basis algorithm selects then the bandlet basis ^ amongst &n = ^(i-jf 
that minimizes 

\\Py^X - P^^^^^.xf + r^dim (^^,x,r) 
and the model selection based estimate is F = Pj^^^^X. We should now specify the 
choice of N = (2~^)^ and T in order to be able to use Theorem [T] and Theorem [2] to 
obtain the near asymptotic minimaxity of the estimator. On the one hand, the dimen- 
sion N should be chosen large enough so that the unknown linear approximation error 
11/ — Py^W^ is small. One the other hand, the dimension N should not be too large so 

that the total number of bandlets Kf^, which satisfies V^^^^"^^ < Kn < CkVn^^^^\ 
imposing a lower bound on the value of the threshold remains small. For the sake of 
simplicity, as we consider an asymptotic behaviour, we assume that a is smaller than 
1/4. This implies that it exists j < such that G G (2^~\2^] The following theorem 
proves that choosing N = 2~^j and T = A Y^|logO"|(J with A large enough yields a 
nearly asymptotically minimax estimator. 

Theorem 3. Let a < p where p in the number of wavelet vanishing moments and 
let Ko G N* and A > y^2(pT~^supf^yf^^^{K). For any geometrically regular 
function /, there exists C > such that for any 

a < min(^,max(Q,/ro/2)-'/(^+4)), 

if we let N = 2~^-^ with j such that a G (2-^~\2-^] and T = Xy^\logo~\G, the estimator 
F — Pj^^^ obtained by thresholding Py^X with a threshold T in the basis ^ of S^j^ 
that minimizes 

\\Py^X - P^^^^^.xf + r^dim (^^,x,r) 

satisfies 

E[\\f-Ff] <C(|loga|<72)^. 
Theorem [3] is a direct consequence of Theorem [T] and Theorem |2l 
Proof For any a e {V'^ V] , observe that 2-^(^+4) <Kn = K^2-J)^ < C^2-^(^+4) 
thus (2(7)~(^+'^) < Kn < CkO'^p^^^ The restriction on a further implies then that 
Kn > Ko and Kn < a'^^P^'^l As A > ^2(^+4) sup^>^^ Ao(^), T = A V^Togo^a > 
A Y^log(^iv)(7 with A > Ao(^iv) so that Theorem [T] applies. This yields 

E[\\f-Ff]<{l + e)min{\\f-P^ff + T^dim{^))+^a^ . (5) 
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Now as r > 2^ , Theorem|2] applies and there is a constant C independent of T such that 
min (||/-P^/||2 + r2dim(^))<C(r2)«/(«+i) . 

Plugging this bound into © gives the result. □ 

The estimate F = X is computed efficiently by the same fast algorithm used 
in the approximation setting without requiring the knowledge of the regularity parame- 
ter a. The model selection based bandlet estimator is thus a tractable adaptive estima- 
tor that attains, up to the logarithmic term, the best possible asymptotic minimax risk 
decay for geometrically regular function. 

Although Theorem[3]applies only to geometrically regular function, one can use 
the bandlet estimator for any type of images. Figure [3] illustrates the good behaviour 
of the bandlet estimator for natural images already shown in [20] . Each line presents 
the original image, the degraded noisy image and two estimations, one using classi- 
cal translation invariant estimator [6] . and the other using the bandlet estimator. The 
bandlet improvement with respect to the classical wavelet estimator can be seen nu- 
merically as well as visually. The quadratic error is smaller with the bandlet estimator 
and the bandlets preserve much more geometric structures in the images. 



A. Proof of Theorem [T] 

Concentration inequalities are at the core of all the selection model estimators. Es- 
sentially, the penalty should dominate the random fluctuation of the minimized quan- 
tity. The key lemma. Lemma [21 uses a concentration inequality for Gaussian variable 
to ensure, with high probability, that the noise energy is small simultaneously in all the 
subspaces J^i spanned by a subset / of the Kjq different vectors, denoted by gk, of 

Lemma 2. For all u>0, with a probability greater than or equal to 1 — 2/Kf^e~^, 
yic{l,...,KN}and^i = Sp^n{gk}ka. \\P^i^\\ < V41og(^iv)dim(^/) +2w 

where dim is the dimension of^j. 

Proof. The key ingredient of this proof is a concentration inequality. Tsirelson's Lemma i22|] 
implies that for any 1-Lipschitz function : ^ C (|0(x) — 0(j)| < — j||) if W is 
a Gaussian standard white noise in then 

F{(t){W) > E [(t){W)] -^t} < e~'^^^ . 

For any space f ^ \\Pj^f\\ is 1-Lipschitz. Note that one can first project / 
into the finite dimensional space V/v without modifying the norm. We can thus apply 
Tsirelson's Lemma with t — Y^41og(^iv)dim(^) +2w and obtain 

P { \\PjiW\\ > E [\\P.^W\\] + V4log{KN)dim{^) + 2u} < ^;2dim(^)^-„ _ 
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Figure 3: Comparison between the translation invariant wavelet estimator and the bandlet estimator. The 
number within parenthesis is the PSNR defined by — ^Iglog ( ^ ) ^^^^ larger the better). 



Now as E [\\P^W\\] < {E [\\P^Wf] = v'dim(^), one derives 

> Vdim(^) + V41og(/^^)dim(^) + 2w} < K'^"^^^^' 



Now 



^{3/c{l,...,MJI^^/W|| > Vdim(^/) + V41og(/^iv)dim(^/) +2^} 

< L S'lll^^/^ll > \/dim(^/) + V41og(/^iv)dim(^/) + 2w| 
/c{l,...,^^} 

^ ^-2dim(^/)^_M 
/C{1,...,^A^} 



TV ^ 



and thus 



^{3/ C {1, . . . > Vdim(^/) + y^4log{KN)dim{^i)^2u} 



^ 2 



□ 



The proof of Theorem [T] follows from the definition of the best basis, the oracle 
subspace and the previous Lemma. 

Proof of TheoremU] Recall, that /V^X = Py^f ^ oPv^^W e Vn with Py^W sl Gaussian 
white noise. By construction, the thresholding estimate is P //^^ X where 

J = arg min ||/V^X-/V^^^X|p + dim(^^,x,r) • 

To simplify the notation, we denote by M and dim ^ the corresponding space and 
its dimension. 

Denote now dim(^o) the dimension of the oracle subspace that has been 
defined as the minimizer of 

\\PvJ-Pjtff^d:xm{Jt)T^ . 

By construction, 

||/V^X-P^X|p + A2log(/r^)a2dim(^) < \\Py^X -Pj^jf ^X^\og(K^)c^ d:xm{Jl^) . 
Using 

\\Pv,X-P^^f=\\Pv,X-Pvjf-.\\PyJ-PjKf-.2{Pv,X-PvJ,PvJ-Pj^^ 
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and a similar equality for ||/Viv^ — P.£o^^' ''"^ obtains 

\\PvJ-Pj^\\^ + X^\oz{KN)c'^d:mv (^) < ||/V„/-P^^/||2 + A2log(^w)c72dim(^o) 

One should now focus on the bound on the scalar product : 
\2{Pv^X-PvJ,P-^X-Pj,J)\ 

^2a||/'^^.iy||(||P^X-/V,/|| + ||/V,/-Wll) 



and, using Lemma[2l with a probability greater than or equal to 1 — " 



< 2cj dim (^^^ + dim (^o) + y41og(7^iv)(dim +dim(^o))+2w 
x(||P^-Pv^/|| + ||Pv^/-P^^/||) 
applying Ixy < ^~^x^ + jS^j^ successively with = ^ and j3 = 1 leads to 
|2(/V^X-/V;,/,P^X-P^^/)| 

-2 



^ j 2cT^(dim (^^^ +dim(^o) +41og(^:Ar)(dim (^^j +dim(^o)) +2m) 

+ 2(||/'^-/V;,/|P+||/V;,/-/^^o/ll') • 

Inserting this bound into 

\\PvJ -P,J<-\? + XHog{KN)a^ dim < \\PvJ - P^jf + \og{Kr,)a^ dim (^o) 

+ |2{/V^X-/V^/,P^-P^„/)| 

yields 

h\Pyj -P jLf <I^P^J -Pj^jf ^a^{X^^^ 

+ (7^(8(1 +41og(^iv)) -A^log(/riv)) dim (^) + 16a^w 



So that if > 32 + 

ll/V^Z-Z'^lP < 3||/V;v/-^^o/lP + 4<7^^^1og(^w)dim(^o) + 32cr2„ 
which implies 

- f < 4( - + a^A^ log(/s:^) dim (^o)) + 32a2M 
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where this result holds with probability greater than or equal to 1 — 
Recalling that this is valid for all u>0, one has 

¥{\\PvJ-PjXf-4{\\Pv^f-P^jf^G^:^Hog{KN^ > 32g\} < ^e'^ 

which implies by integration over u 

that is the bound of Theorem[T] 
E[\\Pv,f-PjXf] <4{\\Pv,f-P^jf + a^X^\og{KN)dim{-^o)) + ^2a^^ 

up to 11/ — Pyn/W^ which can be added on both size of the inequality. □ 
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